We've provided the scripts of all the movies, in case your machine doesn't have audio capabilities.
![]() | Treeviz | Movie [8.6 MB] | Script | ||||
![]() | Eviviz | Movie [6.1 MB] | Script | ||||
![]() | Scatterviz | Movie [9.9 MB] | Script | ||||
![]() | Mapviz | Movie [6.7 MB] | Script |
You are viewing a decision tree which represents a classifier that was automatically built by MineSet to explain when working adults in the US are earning more than $50,000.
The two bars on the root node represent different classes: the pink bar represents the class of people earning under $50,000 and the yellow bar represents the class earning over $50,000
By pointing to the bars, we can see text at the upper left that shows the exact proporions: 76% of the sample earn under $50,000 while 24% earn over $50,000.
The decision tree tells us that the most important factor for distinguishing the two classes is age. It also determines the threshold age of 27 as crucial. People under the age of 27 are represented in the left child of the tree, while people over 27 are represented by the right subtree.
In the left child, representing people younger than 27, about 97% earn less than $50,000. Going to the right, the decision tree tells us that education is an important factor. The census bureau assign numbers for different levels of education. 13 represents a bachelor's degree and MineSet determined that this was a crucial threshold. One can see that the distribution in the two children is very different. People over 27 without a bachelor's degree are represented in the left subtree, which shows 21% earning over $50,000. The right subtree shows that the bachelor's degree increases the probability to 55%.
The scatter visualizer allows you to look at multi-dimensional data. The data shown represents births in the Netherlands. The axes are the region name, the region population density, and the number of births per 1,000 women. Moving the mouse over a cube displays the information for that cube.
The color of each cube represents the region's total population. There are two sliders on the right side, that allow users to slice through the data at different values of the independent variables. In our example, these are the women's age and the year the survey was done.
The data shows women at age 20. To switch to 25, we click on a higher point. We can also animate how the data change over time by plotting a path in the panel. By using the VCR buttons, we can see that the number of births increases as women get older then decreases as women get past child-bearing age.
The scatter visualizer allows you to see 8 dimensions: the three axes, the shapes of entities, their color, their size, and two independent dimensions for animation.
The Evidence Visualizer shows how different factors affect the initial probability that a record belongs to a given class. The data shown is a sample of about 30,000 people based on US census bureau data of adults working in the US.
The right pane shows the initial probability in the data. By pointing to one of the classes we can see the text on the top left indicates that 76% earn under $50,000 and 24% earn over $50,000.
The left pane shows how each attribute value affects the probability. If the two slices are approximately equal, as we can see for the second value of the bottom attribute representing race=white, it indicates that the probability will not change if you know that value for a record. However, if a slice is larger than half, it indicates that the factor will increase the relative importance of the class with the slice color.
For example, the first pie for age, representing age less than or equal to 20, gives high evidence that the person will not be making over $50,000. As people age, the yellow slice increases, showing high salaries increase with age. A slight dropoff occurs after age 61.
By rotating the figure, we can see that the pies have heights. The height of a pie is proportional to the number of records. By looking at the sex attribute, we can see that there are more males than females in this sample.
By clicking on the button near a class name, the display will change to a bar display to emphasize which values give evidence that increases the probability of a record belonging to the given class.
The Mapvizmovie will focus on MineSet Map Visualizer.
The map visualizer allows you view data containing geographical relationships.
The data shown represents births in the Netherlands over the different regions. The height of each region represents the number of births. The color, specified at the bottom of the window, indicates the population density of the region. Moving the mouse over a region displays the information for that region.
The two independent variables: age and year allow you to slice through the data. For example, by moving the age slider the map will change. By plotting a path you can animate over the different age groups or years or a combination of both.
As expected, the number of births increases as women age, then it decreases as they get passed child-bearing age.